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Activity Set Model of parallel program behavior and the corresponding parallelism index of a 
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Introduction 


Consider a program that is to be executed on a multiprocessor 
containing N identical processors. Each of these processors may itself 
be equipped with one or more array processors attached for its private 
use. 

Amdahl’s law [1,2] estimates an upper bound of N/log^N for the 
actual speed-up, or ratio of elapsed time with a single processor to the 
elapsed execution time with N processors. Recent experiments [3] have 
shown that this may be an unnecessarily pessimistic estimate of speed-up 
and that values close to N may be obtained for specific applications. 

We have recently derived another approach to computing the estimat- 
ed maximum speed-up [4], assuming an unlimited number of processors, 
based on a model which considers the density p of precedence relations 
between tasks in programs. We obtain [4] an approximate formula 
(1 + p)/2p for the maximum speed-up, on the average, for such a family 
of programs assuming that the number of processors available is unlimit- 
ed. When p is close to 1, nearly all tasks are interdependent and the 
programs will in fact execute sequentially; the speed-up factor itself 
will be close to just 1. On the other hand, when p is very close to 0 
we are dealing with programs composed of many quasi-independent tasks 
and the maximum speed-up is very large. An "infinite" speed-up merely 
means that the average execution time for the program family remains 
nearly constant as the number of individual tasks in the program becomes 
very large. 

In this paper we shall begin by considering Amdahl’s law and suggest 
some modifications or amendments that can serve to explain speed-up 
factors which are nearly linear in the number of processors. We shall 
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then return to the intrinsic processor independent behavior of parallel 
programs and introduce a new model of parallel program behavior, the 
Activity Set Model , * which may be used to describe the behavior of 
parallel programs and to derive bounds to the speed-up which can be 
expected when such programs are execiited or multiprocessors. 

We first consider a simple amendment to Amdahl* s law, and derive a 
speed-up formula with N processors which is of the form 

N 

[(1 - e)(l + 6) + e log^N] 

where £ is the probability that the programs cannot effectively use N 
processors. 6 is a measure of the unbalance between the workloads of 
each processor when N processors are used. Indeed, 6/N is the amount of 
time in excess of the optimistic equal run-time 1/N which the most loaded 
processor will take to run the task that has been assigned to it. Thus, 
when the computation has been organized So that £ is very small, the 
speed-up can be as high as N/(l + 6). These results are detailed in 
Section 2. 

In Section 3, we consider an evaluation of speed-up based on 
intrinsic program behavior which leads to general bounds* 

Throughout the discussion we consider a program, or a family of 
programs, whose average run-time on a single processor is equal to 1. 


lr £he name of this model is, of course, inspired by Peter Denning *s 
Working Set Model used in paging. 
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2 . Amendment to Amdahl * s Law 

Assume that the program, or family of programs, considered makes 
full use of the N processors with probability (1 - e) . It will use only 
i processors (1 ^ i S N - 1) with probability e^, where 

N-l 

£ = 2 e. 


If it uses N processors, one of these will have the greatest work- 
load so that the run-time of the program on N processors is given by 
the time elapsed for the most heavily loaded one, or 


1 

N 


+ 


6 

N 


Obviously, if all the processors' workload were perfectly balanced, 
the elapsed time would be simply 1/N. 

Similarly, when only i processors are used the elapsed time will be 



These times should be viewed as average values when a family of 
programs is considered. 

Thus, the average elapsed time when N processors are available is 
given by the formula 


TOO = (l - e)(i - 5) ♦ Ve.(i ♦ ji) 


So the speed-up T(1)/T(N) = 1/T(N) is simply given by 
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N-l . 6. 

[(1 - e) (1 + 6) + N I s.(i + t~ ) ] 


Cl) 


The Most Favorable Case 

Naturally, the greatest speed-up will be obtained if we set 

e lT .=£,£.= 0 for 1 § i S N - 2. 

N-l ’ l 

We then have for large enough N: 


S 2 


N 

1 + 6(1 - e ) + 6 n _ 1 


( 2 ) 


This formula explains the quasi-linear speed-ups that can be 
encountered for some specific applications. 


The Least Favorable Case; Amdahl's Law 

We consider that the least favorable case for a family of programs 
is obtained by setting 


£^ = 1 - £ = 1/N, for all 1 £ i £ N - 1 


which implies that we are equally likely to make use of any number of 
processors less than N; this is the assumption made in Amdahl's law. We 
then have 


S = 


_N 

N 6. ’ 
+ I ~ 

1 1 


6 n = 6 


(3) 


N 1 

Since I - > log N, we have 
1 * 


S 


< _N_ 

log 2 N 


(4) 
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Amendment to Amdahl * s Law 

Let us consider a family of programs that can fully use all N 
processors with probability 1 - e. If a program in this family cannot 
make full use of them, then assume that it is equally likely to use 
N - 1, N - 2, ...or just one processor. We then have = e/N-l, 
l^i^N- 1, so that we obtain 


- e)(l + 6) + §?j-(log 2 (N - 


N-l 6. 

i) + x j 1 ): 

i 


For large N, this bound becomes 

s < N 

a = (1 - e)(l + 6) + e log 2 N 


( 6 ) 


which is a useful compromise between the unnecessarily pessimistic form 
of Amdahl* s Law, and the overly optimistic linear bound. 

In (6) program characteristics appear via the parameters e and 6, 
where s is the probability that a program is unable to make use of all N 
processors, while 6 measures the imbalance between the average execution 
time of parallel tasks. In the sequel we shall consider a simple 
representation of a parallel program* s execution in virtual time. 


3. The Activity Set Model of Parallel Program Behavior 


Consider a program that is being executed with an unlimited number 


of processors. Its behavior may be characterized by a variable n(t) 
representing the number of active processes or tasks at some instant t, 
lying between the instant t = 0 when the program execution is initiated 
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and the instant t = T when it ends. Such a behavior is shown in 
Figure 1. n(t) is the size of the Activity Set at time t. 



Figure 1. Size of the Activity Set of a parallel program as a function 
of virtual time t, between t = 0 when the program execution begins and 
t = T when it ends . 


The total work 


W = J T n(t)dt 
0 


( 7 ) 


represents the amount of computational effort that must be accomplished 
by any set of processors in order to execute the program. 

The Activity Set of the program at time t is the set of parallel 
processes or tasks that are running simultaneously at time t. 



7 


The parallelism index N^: 

N Q = | J T n(t)dt = W/T (8) 

is the average number of active processes in the program, or the average 
number of processors that it can use simultaneously. 

Time t in the expressions above should be considered to be virtual 
time, or program execution time from which the time spent for all 
interruptions (paging, I/O, other programs* execution time, etc.) has 
been deleted. 

If the program is to be run on a single processor whose speed is c 
times faster than any one of the initial set of processors, the program 
will now run sequentially in time 

T(c, 1) = W/c = N Q T/c (9) 

If, on the other hand, we are limited to running the program on N 
processors, each of which has the same power as one of the processors in 
the initial unlimited set, we can derive some bounds on the execution 
time . 

Consider the quantity 

C(N) = J T (N - n(t) ) + dt 
0 

where (x) + = x if x ^ 0 and (x) + = 0 if x < 0. 

C(N) denotes the amount of additional work that N processors could 
provide during the interval [0,T], or their excess capacity, when the 
program is executed with an unlimited number of processors. It is shown 
as the shaded area of Figure 1. Similarly, 
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D(N) = J T ( n(t) - N) + dt 
0 

is the work accomplished by the other processors being used when the 
total number of processors available is unlimited. 

When the total number of processors is limited to N, the total 
execution time T(N) cannot be smaller than 

T(N) £ T + [D(N) - C(N)] + /N 

since in the best case the excess capacity C(N) will be used to 
accommodate as much excess work D(N) as the processors can take. 

Similarly, an upper bound to T(N) can be derived by considering 
that all of the work D(N) will be accomplished on the N processors after 
time T: 


T(N) 5 T + D(N)/N 


The speed-up obtained by using N processors instead of a single 
processor (of speed c = 1) can now be estimated from these bounds. It is 
given by the formula S = T(1)/T(N) so that 


N 0 T 


* S £ 


T + (D(N)-C(N) ) 
1 N 


N q T 

T + t)(N)/N 


or 


N o N 


V 


N + (D(N)-C(N)) 


N + t)(N)/T 


( 10 ) 


£ S > 


( 11 ) 
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From these inequalities we see quite clearly that S can never 

S 

exceed N n , which should be obvious, but also that — is bounded from 
0 N 0 

above by the multiplicative factor 


1 

(D(N)-C(N)) + 

TN 


and from below by the multiplicative factor 


1 + 


1 

D(N) 

NT 


3 . 1 A Statistical Interpretation 

Clearly, the precise behavior of a program as given by the function 
n(t) is in general very difficult to predict since it will obviously be 
data dependent. Thus, it is quite natural to treat n(t) as a random 
function so that N^, D(N)/T, C(N)/T will now have convenient statistical 
interpretations in terms of the statistical average or expected value 
E[*] taken over the finite time interval [0,T]: 

N Q - E [n(t) ] 

^ - E[(n(t) - N) + ] 

^ ^ E[(N - n(t)) + ] 


We shall now write in particular 

§_ ^ I 

N 0 1 + jjE[ (n(t) - N) + ] 


( 12 ) 
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3 . 2 A Numerical Example 

Within the framework of the statistical interpretation of n(t) , 
consider the case where it is time independent and its distribution 
function is geometric over the interval [ 0 , T ] : 

P[n(t) = i] = q i_1 (l - q), i S 1 

This example describes a situation in which the program is always more 
likely to have a smaller number of active parallel tasks. Clearly, q 
must be chosen so that E[n(t)] = N^, hence = 1/(1 - q) or 

q = (N 0 - 1)/N 0 . 

We then have 


E[(n(t) - N] + ] = I (i - N)q i_1 (l - q) 
i=N 


Hence 


N /n 

= q /(I “ 


N, 


q) * (- 




N, 


N, 



1 


1 H 1 ». j 


(13) 


On Figure 2 we show the lower bound (13) to the speed-up S as a 
function of the number of processors N for various values of N Q (the 
parallelism index) . 
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Figure 2. The lower bound for the speed-up S as a function of N for 
various values of provided by equation (13). 

4 . Conclusions 

In this note we have considered some amendments to Amdahl's law 
which take into account the fact the programs may be able to make effec- 
tive use of all of the processors which are available to them, but which 
also recognize the fact that imbalance in the partition of the workload 
between processors reduces the speed-up one could expect. 

We have then suggested examining the speed-up issue in terms of a 
representation of the intrinsic behavior of parallel program execution 
which we call the Activity Set Model. This model describes the set of 
simultaneously active parallel processes as a function of program virtual 
time. We show how the size of the Activity Set can be used to derive 
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bounds on the speed-up of the program as a function of the number of 
processors which are available to it. 
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